The aim of this study is to characterize changes in the prevalence of resistant bacteria in a way that allows comparisons across bacteria, drugs and countries. For that, we have to acknowledge that the additional number of resistant strains in a given year compared to the previous one depends on:
Indeed, a few resistant strains with a high spreading capacity can produce the just as many additional resistant strains as many resistant strains with a low spreading capacity. This implies that a proper comparisons of the dynamics of change of the number of resistant strains needs to account for the current number of resistant strains. Furthermore, because even bacterial populations cannot grow indefinitely in size, the expected number of new resistant strains also depend on
Indeed, for a given value of resistance fitness and a given number of resistant strains, the additional number of resistant strains produced is lower in a case where all the bacteria are already resistant than in case where a subtantial proportion of the bacteria are still non-resistant and can be “replaced” by resistant strains.
From what precedes, the change in the proportion of resistant strains in drug sensitivity tests performed at two different points in time depends on
We here proposed a simple model that allows to estimate the rate \(\rho\) from resistance proportion data.
Installing the required packages:
required <- c("dplyr", "magrittr", "purrr", "tidyr")
to_install <- setdiff(required, row.names(installed.packages()))
if (length(to_install)) install.packages(to_install)
Loading magrittr for interactive use:
library(magrittr)
From Liselotte Diaz Högberg (Liselotte.Diaz-Hogberg@ecdc.europa.eu), on 2019-02-18:
EARS-Net data are available for public download through our Surveillance Atlas, atlas.ecdc.europa.eu/public (choose Antimicrobial resistance). The atlas includes a data export option in the top right corner. This data are open for use as long as the source is acknowledged. As a researcher, you are also welcome to access EARS-Net data through a third party data access request. Please find more information here ecdc.europa.eu/en/publications-data/european-surveillance-system-tessy and do not hesitate to contact me for further details and clarifications.
We select All time periods, All regions, All indicators that we export to the ECDC_surveillance_data_Antimicrobial_resistance.csv file on 2019-10-03
ecdc <- "ECDC_surveillance_data_Antimicrobial_resistance.csv" %>%
read.csv() %>%
dplyr::filter(Unit == "N") %>%
dplyr::select(-HealthTopic, -Unit, -RegionCode, -TxtValue) %>%
tidyr::separate(Population, c("bacteria", "test"), "\\|") %>%
dplyr::filter(!grepl("Combined", test)) %>%
dplyr::mutate_if(is.factor, as.character) %>%
dplyr::mutate_at("NumValue", as.integer) %>%
tidyr::pivot_wider(names_from = Indicator, values_from = NumValue) %>%
dplyr::transmute(bacteria = bacteria,
test = test,
year = Time,
country = RegionName,
N = `Total tested isolates`,
R = `Resistant (R) isolates`,
I = `Non-susceptible (I and R) isolates` - R)
which gives:
ecdc
## # A tibble: 10,925 x 7
## bacteria test year country N R I
## <chr> <chr> <int> <chr> <int> <int> <int>
## 1 Acinetobacter spp. Aminoglycosides 2012 Austria 0 0 0
## 2 Acinetobacter spp. Aminoglycosides 2012 Belgium 0 0 0
## 3 Acinetobacter spp. Aminoglycosides 2012 Bulgaria 65 38 6
## 4 Acinetobacter spp. Aminoglycosides 2012 Cyprus 23 12 0
## 5 Acinetobacter spp. Aminoglycosides 2012 Czech Republic 0 0 0
## 6 Acinetobacter spp. Aminoglycosides 2012 Germany 119 7 0
## 7 Acinetobacter spp. Aminoglycosides 2012 Denmark 77 8 0
## 8 Acinetobacter spp. Aminoglycosides 2012 Estonia 0 0 0
## 9 Acinetobacter spp. Aminoglycosides 2012 Greece 1234 964 118
## 10 Acinetobacter spp. Aminoglycosides 2012 Spain 0 0 0
## # … with 10,915 more rows
where N is the total number of samples, R is the number of samples that are resistant and I is the number of samples for which the resistance level is intermediate.
The ECDC data has data on 8 bacteria:
sort(unique(ecdc$bacteria))
## [1] "Acinetobacter spp." "Enterococcus faecalis" "Enterococcus faecium" "Escherichia coli"
## [5] "Klebsiella pneumoniae" "Pseudomonas aeruginosa" "Staphylococcus aureus" "Streptococcus pneumoniae"
in 30 countries:
sort(unique(ecdc$country))
## [1] "Austria" "Belgium" "Bulgaria" "Croatia" "Cyprus" "Czech Republic" "Denmark"
## [8] "Estonia" "Finland" "France" "Germany" "Greece" "Hungary" "Iceland"
## [15] "Ireland" "Italy" "Latvia" "Lithuania" "Luxembourg" "Malta" "Netherlands"
## [22] "Norway" "Poland" "Portugal" "Romania" "Slovakia" "Slovenia" "Spain"
## [29] "Sweden" "United Kingdom"
over 18 years from 2000 to 2017. The tests of susceptibility that are performed are the following for the different bacteria:
with(ecdc, table(bacteria, test))
## test
## bacteria Aminoglycosides Aminopenicillins Carbapenems Ceftazidime Fluoroquinolones High-level gentamicin Macrolides
## Acinetobacter spp. 180 0 180 0 180 0 0
## Enterococcus faecalis 0 511 0 0 0 511 0
## Enterococcus faecium 0 511 0 0 0 510 0
## Escherichia coli 512 512 510 0 512 0 0
## Klebsiella pneumoniae 386 0 386 0 386 0 0
## Pseudomonas aeruginosa 386 0 386 386 386 0 0
## Staphylococcus aureus 0 0 0 0 0 0 0
## Streptococcus pneumoniae 0 0 0 0 0 0 386
## test
## bacteria Meticillin (MRSA) Penicillins PiperacillinTazobactam Third-generation cephalosporins Vancomycin
## Acinetobacter spp. 0 0 0 0 0
## Enterococcus faecalis 0 0 0 0 511
## Enterococcus faecium 0 0 0 0 511
## Escherichia coli 0 0 0 512 0
## Klebsiella pneumoniae 0 0 0 386 0
## Pseudomonas aeruginosa 0 0 386 0 0
## Staphylococcus aureus 516 0 0 0 0
## Streptococcus pneumoniae 0 386 0 0 0
The number of biological tests performed in the countries naturally increases with time:
with(ecdc, table(year, country))
## country
## year Austria Belgium Bulgaria Croatia Cyprus Czech Republic Denmark Estonia Finland France Germany Greece Hungary Iceland Ireland
## 2000 12 12 1 0 0 1 12 4 12 0 12 12 0 12 12
## 2001 12 12 12 12 0 12 12 12 12 12 12 12 12 12 12
## 2002 12 12 12 12 0 12 12 12 12 12 12 12 12 12 12
## 2003 12 12 12 12 11 12 12 12 12 12 12 12 12 12 12
## 2004 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
## 2005 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2006 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2007 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2008 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2009 23 23 23 0 23 23 23 23 23 23 23 23 23 23 23
## 2010 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2011 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2012 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2013 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2014 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2015 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2016 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2017 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## country
## year Italy Latvia Lithuania Luxembourg Malta Netherlands Norway Poland Portugal Romania Slovakia Slovenia Spain Sweden United Kingdom
## 2000 12 0 0 12 1 12 12 0 12 0 0 1 12 12 12
## 2001 12 0 0 12 12 12 12 12 12 0 11 12 12 12 12
## 2002 12 0 0 12 12 12 12 12 12 12 12 12 12 12 12
## 2003 12 0 0 12 12 12 12 12 12 12 12 12 12 12 12
## 2004 12 1 0 12 12 12 12 12 12 12 12 12 12 12 12
## 2005 23 23 0 23 23 23 23 23 23 23 23 23 23 23 23
## 2006 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2007 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2008 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2009 23 23 23 23 23 23 23 23 23 23 0 23 23 23 23
## 2010 23 23 23 23 23 23 23 23 23 23 0 23 23 23 23
## 2011 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23
## 2012 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2013 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2014 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2015 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2016 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
## 2017 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
A function that estimates proportion with confidence intervals:
proportion <- function(x, n, ...) {
if (n < 2) return(rep(NA, 3))
with(binom.test(x, n, ...), c(estimate, conf.int))
}
Using this function to add proportion to all the data:
(ecdc2 <- ecdc %>%
dplyr::mutate(prop = purrr::map2(R, N, proportion)) %>%
tidyr::unnest(prop) %>%
dplyr::mutate(., names = rep_len(c("estimate", "lower", "upper"), nrow(.))) %>%
tidyr::pivot_wider(names_from = names, values_from = prop))
## # A tibble: 10,925 x 10
## bacteria test year country N R I estimate lower upper
## <chr> <chr> <int> <chr> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 Acinetobacter spp. Aminoglycosides 2012 Austria 0 0 0 NA NA NA
## 2 Acinetobacter spp. Aminoglycosides 2012 Belgium 0 0 0 NA NA NA
## 3 Acinetobacter spp. Aminoglycosides 2012 Bulgaria 65 38 6 0.585 0.456 0.706
## 4 Acinetobacter spp. Aminoglycosides 2012 Cyprus 23 12 0 0.522 0.306 0.732
## 5 Acinetobacter spp. Aminoglycosides 2012 Czech Republic 0 0 0 NA NA NA
## 6 Acinetobacter spp. Aminoglycosides 2012 Germany 119 7 0 0.0588 0.0240 0.117
## 7 Acinetobacter spp. Aminoglycosides 2012 Denmark 77 8 0 0.104 0.0459 0.194
## 8 Acinetobacter spp. Aminoglycosides 2012 Estonia 0 0 0 NA NA NA
## 9 Acinetobacter spp. Aminoglycosides 2012 Greece 1234 964 118 0.781 0.757 0.804
## 10 Acinetobacter spp. Aminoglycosides 2012 Spain 0 0 0 NA NA NA
## # … with 10,915 more rows
(ecdc2 <- ecdc %>%
dplyr::mutate(prop = purrr::map2(R, N, proportion)) %>%
tidyr::unnest(prop) %>%
dplyr::mutate(., names = rep_len(c("estimate", "lower", "upper"), nrow(.))) %>%
tidyr::pivot_wider(names_from = names, values_from = prop))
## # A tibble: 10,925 x 10
## bacteria test year country N R I estimate lower upper
## <chr> <chr> <int> <chr> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 Acinetobacter spp. Aminoglycosides 2012 Austria 0 0 0 NA NA NA
## 2 Acinetobacter spp. Aminoglycosides 2012 Belgium 0 0 0 NA NA NA
## 3 Acinetobacter spp. Aminoglycosides 2012 Bulgaria 65 38 6 0.585 0.456 0.706
## 4 Acinetobacter spp. Aminoglycosides 2012 Cyprus 23 12 0 0.522 0.306 0.732
## 5 Acinetobacter spp. Aminoglycosides 2012 Czech Republic 0 0 0 NA NA NA
## 6 Acinetobacter spp. Aminoglycosides 2012 Germany 119 7 0 0.0588 0.0240 0.117
## 7 Acinetobacter spp. Aminoglycosides 2012 Denmark 77 8 0 0.104 0.0459 0.194
## 8 Acinetobacter spp. Aminoglycosides 2012 Estonia 0 0 0 NA NA NA
## 9 Acinetobacter spp. Aminoglycosides 2012 Greece 1234 964 118 0.781 0.757 0.804
## 10 Acinetobacter spp. Aminoglycosides 2012 Spain 0 0 0 NA NA NA
## # … with 10,915 more rows
Let’s plot it:
ecdc2 %>%
dplyr::filter(bacteria == "Escherichia coli",
test == "Third-generation cephalosporins",
country == "Netherlands") %>%
dplyr::arrange(year) %>%
with({
plot(year, estimate, pch = 19, ylim = c(0, max(upper)), col = "red",
type = "o", lty = 2, xlab = NA, ylab = "proportion of resistance")
arrows(year, lower, year, upper, .05, 90, 3, col = "red")
})
Let’s make a function of this pipeline:
plot_resistance <- function(df, bug, drug, country, ylim = NULL, color = "#d95f02", ...) {
require(magrittr)
df %<>% dplyr::filter(bacteria == !! bug,
test == !! drug,
country == !! country)
if (is.null(ylim)) ylim <- c(0, max(df$upper, na.rm = TRUE))
df %>%
dplyr::arrange(year) %>%
with({
plot(year, estimate, pch = 19, ylim = ylim, col = color, type = "o",
lty = 2, axes = FALSE, ann = FALSE, ...)
arrows(year, lower, year, upper, .05, 90, 3, col = color)
})
axis(1)
axis(2, col = color, col.axis = color)
title(ylab = "proportion of resistance", col.lab = color)
}
And include this function into another one that plots the resistance levels of a given bacteria and a given antimicriobial for all the countries:
plot_resistances <- function(df, bug, drug, ylim = NULL, color = "#d95f02", ...) {
opar <- par(mfrow = c(10, 3), plt = c(.15, .99, .2, .9))
for (country in sort(unique(ecdc$country))) {
plot_resistance(df, bug, drug, country, ylim, color, ...)
mtext(country)
}
par(opar)
}
We’ll use this function plot_resistances() at the end of this document, after having performed estimation.
Let’s consider the following model between 2 successive years:
\[ R_{y+1} \sim \mbox{Pois}\left(\left[R_y + \rho R_y \left(N_y - \frac{R_y}{N_y}\right)\right]\frac{N_{y + 1}}{N_y}\right) \]
Where \(N_y\) and \(N_{y + 1}\) are the sample sizes at years \(y\) and \(y+1\), and \(R_y\) and \(R_{y + 1}\) are the number of resistant samples at years \(y\) and \(y+1\).
data <- ecdc %>% dplyr::filter(bacteria == "Escherichia coli",
test == "Third-generation cephalosporins",
country == "Netherlands") %>%
dplyr::arrange(year) %>%
dplyr::select(N, R) %>%
head(2) %>%
as.matrix()
The following function computes the \(\lambda\) parameter of the Poisson distribution, as a funtion of the growth rate \(\rho\) (rho) of the resistant strains compared to the susceptible strain (i.e. \(\rho < 0\) means that the resistant strain spreads less than the susceptible strain, \(\rho = 0\) means that the resistant and susceptible strains spread equally well and \(\rho > 0\) means that the resistant strain spreads more than the susceptible strain), \(N_0\) (N_0) and \(R_0\) (R_0) the number of samples and resistant samples at a given year, and \(N_1\) (N_1) the number of samples the year after:
lambda <- function(rho, R0, N0, N1) {
(R0 + rho * R0 * (N0 - R0 / N0)) * N1 / N0
}
Let’s try it:
lambda(.005, 1, 977, 1446)
## [1] 8.710033
The following function uses the lambda() function to compute the minus log-likelihood of a given values of \(\rho\), given the observed values \(N_0\), \(R_0\), \(N_1\) and \(R_1\):
mll <- function(theta, data) {
-dpois(data[2, 2], lambda(theta, data[1, 2], data[1, 1], data[2, 1]), log = TRUE)
}
Let’s try it:
unname(mll(.01, data))
## [1] 4.393962
The funtion mll() is used by the following function in order to estimate the value of \(\rho\), together with its confidence interval:
estimate <- function(data, mll, interval, alpha = .95) {
if (data[1, 2] < 1) return(rep(NA, 3))
if (data[2, 2] < 2) return(rep(NA, 3))
if (data[1, 1] < 2) return(rep(NA, 3))
opt <- function(...) optimise(..., tol = .Machine$double.eps)
mle <- opt(mll, interval, data)
c(mle$minimum,
opt(function(theta, data) abs(mll(theta, data) - mle$objective - qchisq(alpha, 1) / 2), c(interval[1], mle$minimum), data)$minimum,
opt(function(theta, data) abs(mll(theta, data) - mle$objective - qchisq(alpha, 1) / 2), c(mle$minimum, interval[2]), data)$minimum)
}
Let’s try it:
(mle <- estimate(data, mll, c(-.1, .1)))
## [1] 0.004508967 0.001504724 0.009274910
Let’s illustrate the estimation on a figure:
theta_val <- seq(.001, .01, le = 100)
plot(theta_val, mll(theta_val, data), type = "l", xlab = "parameter value",
ylab = "minus log-likelihood", col = "blue")
abline(v = mle, col = "green", lty = 2)
abline(h = mll(mle, data), col = "green", lty = 2)
The following function chunks a 2-column data frame into a list of 2 x 2 data frames, sliding row-wise a window of 2 rows:
chunk <- function(df) {
lapply(1:(nrow(df) - 1), function(x) as.matrix(df[x:(x + 1), ]))
}
We can put this chunk() function together with the estimate() function into the following pipeline that estimates \(\rho\) with confidence interval for all the years of a data frame:
estimate_all <- function(data, mll, interval) {
require(magrittr)
data %<>% dplyr::arrange(year)
data %>%
dplyr::select(N, R) %>%
chunk() %>%
lapply(estimate, mll, interval) %>%
do.call(rbind, .) %>%
as.data.frame() %>%
setNames(c("estimate", "lower", "upper")) %>%
dplyr::mutate(year = data$year[-length(data$year)] + .5) %>%
dplyr::select(year, dplyr::everything())
}
Let’s try it on the resistance to 3rd-generation cephalosporins in \(E. coli\) in the Netherlands and then plot the results:
ecdc %>% dplyr::filter(bacteria == "Escherichia coli",
test == "Third-generation cephalosporins",
country == "Netherlands") %>%
estimate_all(mll, c(-.1, .1)) %>%
with({
plot(year, estimate, xlab = NA, ylab = "resistance fitness",
ylim = c(min(lower, na.rm = TRUE), max(upper, na.rm = TRUE)), col = "blue", pch = 19)
arrows(year, lower, year, upper, .05, 90, 3, col = "blue")
abline(h = 0, lty = 2)
})
Let’s put the above code into the following function:
plot_fitness <- function(df, bug, drug, country, interval, color = "#7570b3") {
est <- df %>% dplyr::filter(bacteria == !! bug,
test == !! drug,
country == !! country) %>%
estimate_all(mll, interval)
if (all(is.na(est$estimate))) {
plot(NA, ann = F, axes = F, xlim = 0:1, ylim = 0:1)
} else {
with(est, {
plot(year, estimate, xlab = NA, xlim = range(df$year),
ylim = c(min(lower, na.rm = TRUE), max(upper, na.rm = TRUE)), col = color, pch = 19, type = "o",
lty = 2, axes = FALSE, ann = FALSE)
arrows(year, lower, year, upper, .05, 90, 3, col = color)
})
abline(h = 0, lty = 2)
axis(4, col = color, col.axis = color)
mtext("resistance fitness", 4, 1.5, cex = 2 / 3, col = color)
}
}
Let’s combine the value of resistance prevalence and estimated resistance fitness on the same figure:
plot_resistance(ecdc2, "Escherichia coli", "Third-generation cephalosporins", "Netherlands")
par(new = TRUE)
plot_fitness(ecdc, "Escherichia coli", "Third-generation cephalosporins", "Netherlands", c(-.1, .1))
Let’s include the above code in the following function:
plot_res_fit <- function(df, bug, drug, ylim = NULL, interval = c(-.1, .1)) {
opar <- par(mfrow = c(10, 3), plt = c(.13, .88, .2, .9))
for (country in sort(unique(ecdc$country))) {
plot_resistance(df, bug, drug, country, ylim, xlim = c(2000, 2017))
par(new = TRUE)
plot_fitness(df, bug, drug, country, interval)
mtext(country)
}
par(opar)
}
Let’s now apply this function for a number of situations
plot_res_fit(ecdc2, "Escherichia coli", "Third-generation cephalosporins")
plot_res_fit(ecdc2, "Klebsiella pneumoniae", "Third-generation cephalosporins")
plot_res_fit(ecdc2, "Klebsiella pneumoniae", "Carbapenems")
plot_res_fit(ecdc2, "Pseudomonas aeruginosa", "Carbapenems")
plot_res_fit(ecdc2, "Pseudomonas aeruginosa", "Ceftazidime")
plot_res_fit(ecdc2, "Staphylococcus aureus", "Meticillin (MRSA)")
The following binomial model is expected to produce exactly the same results (to be checked):
\[ R_{y+1} \sim \mbox{Bin}\left(N_{y+1}, \frac{R_y}{N_y} + \rho R_y \left(1 - \frac{R_y}{N_y^2}\right)\right) \]